The Big Data Newsvendor: Practical Insights from Machine Learning Analysis
نویسندگان
چکیده
We present a version of the newsvendor problem where one has n observations of p features as well as past demand. We consider both " big data " (p/n = O(1)) as well as small data (p/n = o(1)). For small data, we provide a linear programming machine learning algorithm that yields an asymptotically optimal order quantity. We also derive a generalization bound based on algorithmic stability, which is an upper bound on the expected out-of-sample cost. For big data, we propose a regularized version of the algorithm to address the curse of dimensionality. A generalization bound is derived for this case as well, bounding the out-of-sample cost with a quantity that depends on n and the amount of regularization. We apply the algorithm to analyze the newsvendor cost of nurse staffing using data from the emergency room of a large teaching hospital and show that (i) incorporating appropriate features can reduce the out-of-sample cost by up to 23% relative to the featureless Sample Average Approximation approach, and (ii) regularization can automate feature-selection while controlling the out-of-sample cost. By an appropriate choice of the newsvendor underage and overage costs, our results also apply to quantile regression.
منابع مشابه
The Big Data Newsvendor: Practical Insights from Machine Learning
We investigate the newsvendor problem when one has n observations of p features related to the demand as well as past demands. Both small data (p/n = o(1)) and big data (p/n = O(1)) are considered. For both cases, we propose a machine learning algorithm to solve the problem and derive a tight generalization bound on the expected out-of-sample cost. The algorithms can be extended intuitively to ...
متن کاملBirhanu Eshete
My research interests span the areas of systems security, cyber-crime analysis, big-data security analytics, and machine learning for security. In systems security, I particularly focus on the analysis and detection of advanced and persistent threats, web application security, and web-borne malware defense. In cyber-crime analysis, I focus on malicious sites/URLs, exploit kits, and ransomware. ...
متن کاملBig Data Systems Meet Machine Learning Challenges: Towards Big Data Science as a Service
Recently, we have been witnessing huge advancements in the scale of data we routinely generate and collect in pretty much everything we do, as well as our ability to exploit modern technologies to process, analyze and understand this data. The intersection of these trends is what is called, nowadays, as Big Data Science. Cloud computing represents a practical and cost-effective solution for sup...
متن کاملComparison of machine learning techniques for handling multicollinearity in big data analytics and high - performance data mining Gerard
§ The insights gained from this study could be useful in selecting machine-learning methods for automated pre-processing of thousands of correlated variables in biomedical data mining. Conclusions Comparison of machine learning techniques for handling multicollinearity in big data analytics and high-performance data mining Gerard G. Dumancas1* and Ghalib Bello2 *1Oklahoma Baptist University, S...
متن کاملHandling Big Data Stream Analytics using SAMOA Framework - A Practical Experience
Data analytics and machine learning has always been of great importance in almost every field especially in business decision making and strategy building, in healthcare domain, in text mining and pattern identification on the web, in meteorological department, etc. The daily exponential growth of data today has shifted the normal data analytics to new paradigm of Big Data Analytics and Big Dat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013